MeSH Up: effective MeSH text classification for improved document retrieval

نویسندگان

  • Dolf Trieschnigg
  • Piotr Pezik
  • Vivian Lee
  • Franciska de Jong
  • Wessel Kraaij
  • Dietrich Rebholz-Schuhmann
چکیده

MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems. RESULTS We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone. CONCLUSIONS The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Response to comment on 'MeSH-up: effective MeSH text classification for improved document retrieval'

In response to the methodological considerations, we emphasize that in our paper we compare different MeSH classification systems on two tasks: (i) reproducing manual MeSH recommendations (referred to as indexing by Névéol et al.) and (ii) translating a textual query to an additional MeSH representation (referred to as query expansion). We show that the approach we propose works well on both ta...

متن کامل

Comment on 'MeSH-up: effective MeSH text classification for improved document retrieval'

Information retrieval is an important task that requires specific attention in the biomedical domain where controlled vocabularies are available to characterize and organize textual content. A recent article published in Bioinformatics (Trieschnigg et al., 2009) confirms that there is a continued interest in the community to address this problem and achieve ‘improved document retrieval’. As sho...

متن کامل

Text-Based Medical Case Retrieval Using MeSH Ontology

Our approach to the ImageCLEF medical case retrieval task consists of text-only retrieval combined with utilizing the Medical Subject Headings (MeSH) ontology. MeSH terms extracted from the query are used for query expansion or query term weighting. MeSH annotations of documents available from PubMed Central are added to the corpus. Retrieval results improve slightly upon full-text retrieval.

متن کامل

Evaluation of Automatically Assigned MeSH Terms for Retrieval of Medical Images

This paper presents the results of the State University of New York at Buffalo (UB) team in collaboration with the National Library of Medicine (NLM) in the 2007 ImageCLEFmed task. We use a system that combines visual features (using a CBIR System) and text retrieval. We used the Medical Text Indexer (MTI) developed by NLM to automatically assign MeSH terms and UMLS concepts to the English free...

متن کامل

The GUC Goes to TREC 2004: Using Whole or Partial Documents for Retrieval and Classification in the Genomics Track

We were interested in examining the relative effect of using parts of the documents, different combinations of parts of the documents, or whole documents on retrieval and classification. We were also interested in the effect of MeSH terms on retrieval. Our experiments show that indexing titles, abstracts, and MeSH terms for adhoc retrieval yielded statistically significantly better results than...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2009